Efficient access methods for very large distributed graph databases
نویسندگان
چکیده
• Indexing techniques are essential in large scale subgraph searching. Three new indexing proposed, which leverage the use of bitmaps. Generic framework for filter-then-verify implementations on top Apache Spark. Evaluation shows that different indexes suitable query selectivities. A distributed approach is very databases and low selective queries. Subgraph searching an problem graph databases, but it also challenging due to involved isomorphism NP-Complete sub-problem. Filter-Then-Verify (FTV) methods mitigate performance overheads by using index prune out graphs do not fit a filtering stage, reducing number evaluations subsequent verification stage. has be applied (tens millions graphs) real applications such as molecular substructure Previous surveys have identified FTV solutions GraphGrepSX (GGSX) CT-Index best ones (thousands graphs), however they cannot reach reasonable graphs). This paper proposes generic implementation solutions. Besides, three previous improve GGSX adapted executed clusters. The evaluation how achieved provide great improvement (between 70% 90% time reduction) centralized configuration may used achieve efficient over cluster configurations.
منابع مشابه
Chapter 4 QUERY LANGUAGE AND ACCESS METHODS FOR GRAPH DATABASES
With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We present a graph query language (GraphQL) that supports bulk operations on graphs with arbitrary structures and annotated attributes. In this language, graphs are the basic unit of information and each query manipula...
متن کاملMining Very Large Databases
38 Computer E stablished companies have had decades to accumulate masses of data about their customers , suppliers, and products and services. The rapid pace of e-commerce means that Web startups can become huge enterprises in months, not years, amassing proportionately large databases as they grow. Data mining, also known as knowledge discovery in databases, 1 gives organizations the tools to ...
متن کاملEfficient Subgraph Similarity Search on Large Probabilistic Graph Databases
Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and u...
متن کاملQuality of Very Large Databases
Analyses and data mining of large computer files are affected by the quality of the information in the files. For large population registers and for files that are created by merging two or more files, duplicate entries must be identified. Duplicate identification can depend on record linkage software that can deal with name, address, and date-of-birth data containing many typographical errors....
متن کاملVery Large Databases: How Large, How Different?
Soon, the world. will need far more truly large databases then any of us ever imagined; yet, ironically, without a lot of care, VLDB’s,as we know them today may be left along the wayside. The way in which we think about, design and build enormous databases will have to completely change if we are to participate in this revolution. By now everybody, including database people, realizes that the c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Sciences
سال: 2021
ISSN: ['0020-0255', '1872-6291']
DOI: https://doi.org/10.1016/j.ins.2021.05.047